The effects of the COVID-19 pandemic on mental health have become an increasingly recognized issue in public discourse, yet mental health science perspectives remain relatively overlooked in the construction of research priorities and policy agendas related to recovery from the COVID-19 pandemic. Researchers recognize that the psychological and social effects of the COVID-19 pandemic on mental health are pervasive (Holmes & O’Connor et al., 2020) and that, without government support, individuals are at risk of bearing a “psychological scar” (Clemente-Suárez et al., 2020, p. 4) long after the pandemic ends. Therefore, coordination between governments and global research platforms to prioritize understanding mental health outcomes during the COVID-19 pandemic is necessary to mitigate mental health consequences for vulnerable groups under current and future pandemic conditions (Holmes & O’Connor et al., 2020).
The objective of this report is to inform readers about the impacts of COVID-19 on mental health in the United States. Although mental health is a global issue, the decentralized and inconsistent response to COVID-19 in the United States motivates us to explore populations in the United States as a case study for populations in countries implementing similar dynamic responses to the pandemic. The cyclic nature of lockdowns and control measures in the United States means Americans will continue to experience stressors associated with the COVID-19 pandemic. So, we hope that our findings will help advance knowledge about Americans’ vulnerability to negative mental health outcomes so that susceptible groups can be prioritized in government response and assistance programs related to COVID-19 or future pandemics.
To investigate the impact of COVID-19 on mental health in the United States, we explore data from the WHO Coronavirus Dashboard and the United States Census Bureau Household Pulse Survey (HPS). We also provide descriptive analysis of Google search queries related to anxiety and depression to supplement the HPS data and guide our hypotheses. We define our primary question of interest as whether an association exists between the number of confirmed COVID-19 cases and the frequency of indicators, or symptoms, of depressive or anxiety disorder. Furthermore, we define our secondary question of interest as, if an association exists between the number of confirmed COVID-19 cases and the frequency of indicators of depressive and anxiety disorder, whether a specific population in the United States experiences the greatest frequency of symptoms. To produce combined data sets for our analysis, we select the Date_reported, Country, New_cases, Cumulative_cases, New_deaths, and Cumulative_deaths variables from the WHO Coronavirus Dashboard and the Indicator, Group, Time Period, Time Period Label, Time Period Start Date, Time Period End Date, Value, Low CI, High CI, and Quartile Range variables from the HPS.
Prior analyses of HPS data suggest a disproportionate negative impact of COVID-19 stressors on emotional distress among older persons of color. (Bui et al., 2020) and a significant mental health burden stemming from job insecurity among young adults amidst the COVID-19 pandemic (Ganson et al., 2021). Consequently, we hypothesize that the frequency of indicators of depressive or anxiety disorder increase as the new case count associated with COVID-19 grows in the United States. We also expect that, when comparing populations defined by ethnicity, non-White groups will have the greatest frequency of symptoms. When comparing populations defined by age, we expect to find the highest frequency of symptoms related to depressive or anxiety disorder among young adults, but older adults may also be vulnerable to a greater risk of symptoms.
Recent findings on the impacts of COVID-19 on mental well-being motivate our analysis. On a global scale, researchers have analyzed data from Google Trends to test whether COVID-19 and the associated lockdowns implemented in Europe and America led to changes in well-being related topic search terms; the detection of a significant increase in searches for loneliness, worry, and sadness suggests that people’s mental health may have been severely affected by the pandemic and lockdown (Brodeur et al., 2020). Since data from before the pandemic and lockdowns began is necessary to fully assess how the COVID-19 affects population well-being, Brodeur et al. only analyzed data from Google Trends between January 1st 2019 and April 10th 2020 in countries that had introduced a full lockdown by the end of this period (Brodeur et al., 2020). Therefore, we can interpret Brodeur et al.’s finding of a significant increase in search terms related to poor mental well-being as being associated with the pandemic and lockdowns. Brodeur et al.’s findings suggest that we may expect a positive relationship between the number of COVID-19 cases and the frequency indicators of anxiety and depression in the United States when addressing our primary question of interest.
On a national scale, researchers have identified disparities in the degree of negative mental health consequences associated with COVID-19 across age groups and ethnicities. For example, research in the United States, United Kingdom, and Italy consistently shows that younger adults are more vulnerable to increased psychological distress than older adults (Abbott, 2020; Delmastro & Zamariola, 2020), and that within the United States, job insecurity associated with the pandemic might exacerbate symptoms of poor mental health among young people (Ganson et al., 2021). In addition, greater frequencies of negative health outcomes among minority ethnicities may stem from being disproportionately affected by COVID-19 containment policies (Abbott et al., 2021) or from expectations regarding future inability to recover lost economic resources in the future (Bui et al, 2020). Since these findings come from analysis on data from either large, representative samples or the HPS, we expect to find similar disparities across age or ethnicities when exploring our secondary question of interest.
We compiled and filtered data from the WHO COVID-19 Dashboard and the HPS to produce the data set for our model. The WHO Coronavirus Dashboard reflects laboratory-confirmed cases and deaths based upon WHO case definitions (World Health Organization [WHO], 2020). All WHO COVID-19 data represents the date of reporting as opposed to date of symptom onset and counts of new cases and deaths were calculated by subtracting previous cumulative total counts from the current count (WHO, 2020). The HPS was developed to help understand the social and economic impacts of COVID-19 on American households and designed to meet the goal of accurate and timely weekly estimates (Fields et al., 2020). HPS investigators expected low response rates, so they applied a weighting procedure within each state consisting of four adjustments that were applied to the base weights (Fields et al., 2020). Although the HPS produced responses related to many social and economic measures across three different geographic levels (estimates for the 15 largest Metropolitan Statistical Areas, state-level estimates for each of the 50 states and the District of Columbia, and national-level estimates), our predictive analysis is based on national-level estimates of the percentage of respondents who report experiencing symptoms associated with diagnoses of generalized anxiety disorder or major depressive disorder.
With so many datasets to compare it is important to process them carefully and consistently. The data collected from the WHO on Global COVID-19 deaths and cases is first reduced in scope by filtering out all countries except the United States. The data from the Household Pulse Survey (HPS) is looked at in multiple lights. Since the analysis to follow will be focused on country-wide trends, the HPS data is reduced to only contain the national measure for anxiety or depressive disorder. Later in this section this data will be separated and presented based on different groupings such as race, sex, and state. The data collected from Google Trends has already been reduced to covering the United States and so little pre-processing is required.
The HPS data is presented in the following three visuals by race, sex, and state. To aggregate the values in the data based on the selected groupings the mean was used. This data is already heavily pre-processed, so using the mean is the most reasonable as the data can be assumed to be normally distributed.
In the first figure, the racial groups with the lowest survey responses indicating Anxiety or Depression, by almost 10%, are Non-Hispanic white and Non-Hispanic Asian. In an analysis based on the sex, female respondents indicated anxiety or depression symptoms approximately 10% more than male respondents. The second figure shows a heat map of the United States where more yellow coloring represents low percentages of responses indicating anxiety or depression and more blue colors represent higher percentages of responses. It appears the northern-middle part of the country has lowest percentages of responses indicating anxiety or depression, by almost 10%.
To present these several datasets on the same time line, the time frame was restricted to the 2020 calendar year. Since all the datasets except for the WHO COVID-19 cases and deaths are recorded in constant time periods that are greater than a day, they were considered to be constant during their respective sampling period. For example, the Google trends data on Anxiety is recorded every week. The value recorded every week is assigned to every day of that week. This allows for simple visual comparisons as shown below.
Strangely, the month of May has a day with a negative number of deaths, which isn’t possible. The maximum number of deaths in a day is exactly 5,000 which is highly unlikely. It is possible that there were mistakes made in the data collection process and these instances were corrections. Further investigation outside this report is warranted by this strange behavior, but for now the first error is assumed to be a correction of an over-count and the second error is assumed to be an estimate of that day’s number of deaths due to COVID. Both of these can be considered a part of the noise that naturally exists in the data.
In order to create the linear model first the dates needed to be standardized. The CDC Depression and Anxiety data has 24 datapoints for distinct time periods. The WHO COVID-19 data has 427 datapoints for distinct days. The days in the WHO COVID-19 that math ot the time periods in the CDC Depression and Anxiety data were then compiled and averages were taken for new cases and new deaths. Doing this contrils for the time series variable in the data.
First plots were examined showing new COVID-19 cases or new COVID-19 deaths with values reported for symptoms of Anxiety Disorder or Depressive Disorder, this was done in order to visually assess if a linear model would be an appropriate fit to this data. After the plots were examined it was conlcuded that a linear model would provide, atleast initially, a suitable fit to the data presented.
The linear models built to examine new COVID-19 cases or new COVID-19 deaths and values reported for symptoms of Anxiety Disorder or Depressive Disorder assume that the relationship between \(X\) and \(Y\) is linear and that for any fixed value of \(X\), \(Y\) is normally distributed. This model also assumes homoscedasticity, that the variance of residuals is the same for any value of \(X\), and independence, that each observation is independent of each other. In order to put the values for \(X\) and \(Y\) on a similiar scale the log of \(Y\) was used in the model.
In order to determine if there was any relationship between new COVID-19 cases and values for symptoms reported of Anxiety Disorder or Depressive Disorder, the relationship between new COVID-19 deaths and values for symptoms reported of Anxiety Disorder or Depressive Disorder were also examined. In roder to examine these relationships a linear model was fit. A linear model is appropriate for these analyses because the mdoel is explaining a continuous variable, \(Y\), as a mathematical function of one or more variables, \(X\), so that we can use this regression model to predict the \(Y\) when only the \(X\) is known. The linear model is defined as \(Y = \beta_1 + \beta_2X + \epsilon\) where \(Y\) represents the Anxiety Disorder or Depressive Disorder value. \(\beta_1\) represents the intercept and \(\beta_2\) represents the slope. \(\epsilon\) represents the error term, the part of \(Y\) that the regression model is unable to explain. The \(H_0\) is that \(B_2=0\) and the \(H_A\) is that \(B_2\ne0\) meaning that there is a relationship between \(X\) and \(Y\).
This model assumes that the relationship between \(X\) and \(Y\) is linear and for ant fixed value of \(X\), \(Y\) is normally distributed. This model also assumes homoscedasticity, that the variance of residuals is the same for any value of \(X\), and independence, that each observation is independent of each other. In order to put the values for \(X\) and \(Y\) on a similiar scale the log of \(Y\) was used in the model.
##
## Call:
## lm(formula = log(cases) ~ anxiety_value, data = compiled_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.52978 -0.26460 -0.00436 0.22306 0.67797
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.65076 0.90521 0.719 0.48
## anxiety_value 0.26883 0.02369 11.349 1.15e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3119 on 22 degrees of freedom
## Multiple R-squared: 0.8541, Adjusted R-squared: 0.8475
## F-statistic: 128.8 on 1 and 22 DF, p-value: 1.151e-10
The reported summary for this model relating \(Y\), number of new covid cases, to \(X\) the value reported in the CDC Depression and Anxiety data, shows a positive linear relationship that is significant at p value = 0.05, from this information we can reject the null hypothesis, \(H_0\), which states that \(B_2=0\).
##
## Call:
## lm(formula = log(deaths) ~ anxiety_value, data = compiled_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9003 -0.4282 -0.1433 0.4638 0.9418
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.68769 1.55852 3.008 0.00648 **
## anxiety_value 0.06283 0.04078 1.541 0.13769
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.537 on 22 degrees of freedom
## Multiple R-squared: 0.09737, Adjusted R-squared: 0.05634
## F-statistic: 2.373 on 1 and 22 DF, p-value: 0.1377
The reported summary for this model relating \(Y\), number of new covid deaths, to \(X\) the value reported in the CDC Depression and Anxiety data, shows a positive linear relationship that is not significant at p value = 0.05, from this information we can accept the null hypothesis, \(H_0\), which states that \(B_2=0\).
The linear model built to examine the relationship of new COVID-19 cases versus values reported in the CDC Depression and Anxiety data shows that there is in fact a significant positive relationship between the two variables, this result allows for further analyses to examine if there were any differences in the CDC Depression and Anxiety data between races and age groups over the time period of the COVID-19 pandemic. Examining levels of anxiety and depression with an ANOVA between race groups and age groups will show if the pandemic effected the anxiety and depression levels of these race groups and age groups uniformly or disparately.
In order to determine if there were any differences in values reported for symptoms of Anxiety Disorder or Depressive Disorder across race group and across age group an ANOVA model was fit (one model for each). An ANOVA model is appropriate for this analysis because there are one categorical explanatory variables. The two-way ANOVA model is defined as \(Y_{ij} = \mu + \alpha_{i} + \epsilon_{ij}\) where: \(Y_{ij}\) represents the expected value for reported symptoms of Anxiety Disporder or Depressive Disorder \(i\) represents the race
\(j\) represents the \(j\)th oberservation for treatment \(i\) \(\mu{..}\) represents the expected mean over all races $ _{i}$ represents the mean adjustment for level \(i\) of race
\(\epsilon_{ij}\) represents the error for race \(i\), observation \(j\)
This model assumes that the mean of the response variable is influenced additively and linearly by the experimental factor, the errors of the model are independent, the errors have the same variance, and the errors are normally distributed. For this model that means that the value for reported symptoms of Anxiety Disorder or Depressive Disorder are influenced by race. For each observation, \(n\), in this model there is unexplained variation in the reported symptoms of Anxiety Disorder or Depressive Disorder from the calculated reported symptoms of Anxiety Disorder or Depressive Disorder, represented by \(Y_{ij}\), which is held in the error term. These deviations are asssumed to be independent of one another and normally distributed around a mean of 0.
## Indicator Group State Subgroup
## Length:120 Length:120 Length:120 Length:120
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Phase Time.Period Time.Period.Label Time.Period.Start.Date
## Length:120 Min. : 1.00 Length:120 Length:120
## Class :character 1st Qu.: 6.75 Class :character Class :character
## Mode :character Median :12.50 Mode :character Mode :character
## Mean :12.50
## 3rd Qu.:18.25
## Max. :24.00
## Time.Period.End.Date Value Low.CI High.CI
## Length:120 Min. :27.40 Min. :24.40 Min. :29.50
## Class :character 1st Qu.:35.60 1st Qu.:33.30 1st Qu.:37.08
## Mode :character Median :40.25 Median :38.10 Median :42.80
## Mean :39.91 Mean :37.29 Mean :42.56
## 3rd Qu.:44.02 3rd Qu.:41.00 3rd Qu.:47.40
## Max. :52.60 Max. :48.40 Max. :56.80
## Confidence.Interval Quartile.Range
## Length:120 Length:120
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Df Sum Sq Mean Sq F value Pr(>F)
## Subgroup 4 3130 782.4 81.06 <2e-16 ***
## Residuals 115 1110 9.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## $emmeans
## Subgroup emmean SE df lower.CL
## Hispanic or Latino 43.3 0.634 115 42.0
## Non-Hispanic Asian, single race 32.3 0.634 115 31.1
## Non-Hispanic black, single race 40.9 0.634 115 39.6
## Non-Hispanic white, single race 36.3 0.634 115 35.1
## Non-Hispanic, other races and multiple races 46.8 0.634 115 45.6
## upper.CL
## 44.5
## 33.6
## 42.1
## 37.6
## 48.1
##
## Confidence level used: 0.95
##
## $contrasts
## contrast
## Hispanic or Latino - (Non-Hispanic Asian, single race)
## Hispanic or Latino - (Non-Hispanic black, single race)
## Hispanic or Latino - (Non-Hispanic white, single race)
## Hispanic or Latino - (Non-Hispanic, other races and multiple races)
## (Non-Hispanic Asian, single race) - (Non-Hispanic black, single race)
## (Non-Hispanic Asian, single race) - (Non-Hispanic white, single race)
## (Non-Hispanic Asian, single race) - (Non-Hispanic, other races and multiple races)
## (Non-Hispanic black, single race) - (Non-Hispanic white, single race)
## (Non-Hispanic black, single race) - (Non-Hispanic, other races and multiple races)
## (Non-Hispanic white, single race) - (Non-Hispanic, other races and multiple races)
## estimate SE df t.ratio p.value
## 10.95 0.897 115 12.215 <.0001
## 2.42 0.897 115 2.699 0.0602
## 6.95 0.897 115 7.754 <.0001
## -3.54 0.897 115 -3.949 0.0013
## -8.53 0.897 115 -9.515 <.0001
## -4.00 0.897 115 -4.460 0.0002
## -14.50 0.897 115 -16.164 <.0001
## 4.53 0.897 115 5.055 <.0001
## -5.96 0.897 115 -6.649 <.0001
## -10.50 0.897 115 -11.703 <.0001
##
## P value adjustment: tukey method for comparing a family of 5 estimates
The results of the ANOVA model comparing reported symptoms of Anxiety Disorder or Depressive Disorder between race groups indicates that there are significant difference between race groups. Non-Hispanic, other races and multiple races had the highest values, followed by Hispanic or Latino which was followed by, by not significantly different than, Non-Hispanic black, single race. The next highest rates were in Non-Hispanic white, single race, then Non-Hispanic Asian, single race. All differences were significant aside from the one mentioned.
## Indicator Group State Subgroup
## Length:168 Length:168 Length:168 Length:168
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Phase Time.Period Time.Period.Label Time.Period.Start.Date
## Length:168 Min. : 1.00 Length:168 Length:168
## Class :character 1st Qu.: 6.75 Class :character Class :character
## Mode :character Median :12.50 Mode :character Mode :character
## Mean :12.50
## 3rd Qu.:18.25
## Max. :24.00
## Time.Period.End.Date Value Low.CI High.CI
## Length:168 Min. :13.90 Min. :10.80 Min. :17.30
## Class :character 1st Qu.:26.38 1st Qu.:23.05 1st Qu.:28.80
## Mode :character Median :35.80 Median :34.05 Median :37.95
## Mean :35.15 Mean :32.91 Mean :37.50
## 3rd Qu.:43.00 3rd Qu.:41.17 3rd Qu.:44.62
## Max. :58.70 Max. :55.80 Max. :61.50
## Confidence.Interval Quartile.Range
## Length:168 Length:168
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Df Sum Sq Mean Sq F value Pr(>F)
## Subgroup 6 18331 3055.2 307.9 <2e-16 ***
## Residuals 161 1597 9.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## $emmeans
## Subgroup emmean SE df lower.CL upper.CL
## 18 - 29 years 51.7 0.643 161 50.4 53.0
## 30 - 39 years 43.7 0.643 161 42.4 44.9
## 40 - 49 years 40.1 0.643 161 38.8 41.4
## 50 - 59 years 37.0 0.643 161 35.8 38.3
## 60 - 69 years 30.1 0.643 161 28.8 31.4
## 70 - 79 years 23.2 0.643 161 22.0 24.5
## 80 years and above 20.3 0.643 161 19.0 21.5
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## (18 - 29 years) - (30 - 39 years) 8.02 0.909 161 8.817 <.0001
## (18 - 29 years) - (40 - 49 years) 11.59 0.909 161 12.744 <.0001
## (18 - 29 years) - (50 - 59 years) 14.65 0.909 161 16.116 <.0001
## (18 - 29 years) - (60 - 69 years) 21.58 0.909 161 23.737 <.0001
## (18 - 29 years) - (70 - 79 years) 28.47 0.909 161 31.307 <.0001
## (18 - 29 years) - 80 years and above 31.43 0.909 161 34.570 <.0001
## (30 - 39 years) - (40 - 49 years) 3.57 0.909 161 3.927 0.0024
## (30 - 39 years) - (50 - 59 years) 6.64 0.909 161 7.300 <.0001
## (30 - 39 years) - (60 - 69 years) 13.57 0.909 161 14.920 <.0001
## (30 - 39 years) - (70 - 79 years) 20.45 0.909 161 22.491 <.0001
## (30 - 39 years) - 80 years and above 23.42 0.909 161 25.753 <.0001
## (40 - 49 years) - (50 - 59 years) 3.07 0.909 161 3.373 0.0159
## (40 - 49 years) - (60 - 69 years) 10.00 0.909 161 10.993 <.0001
## (40 - 49 years) - (70 - 79 years) 16.88 0.909 161 18.563 <.0001
## (40 - 49 years) - 80 years and above 19.85 0.909 161 21.826 <.0001
## (50 - 59 years) - (60 - 69 years) 6.93 0.909 161 7.621 <.0001
## (50 - 59 years) - (70 - 79 years) 13.81 0.909 161 15.191 <.0001
## (50 - 59 years) - 80 years and above 16.78 0.909 161 18.453 <.0001
## (60 - 69 years) - (70 - 79 years) 6.88 0.909 161 7.570 <.0001
## (60 - 69 years) - 80 years and above 9.85 0.909 161 10.833 <.0001
## (70 - 79 years) - 80 years and above 2.97 0.909 161 3.263 0.0224
##
## P value adjustment: tukey method for comparing a family of 7 estimates
The results of the ANOVA model comparing reported symptoms of Anxiety Disorder or Depressive Disorder between age groups indicates that there are significant difference between all age groups. Age group 18-29 years had the highest reported values for symptoms of Anxiety Disorder or Depressive Disorder followed by 30-39 years which was followed by 40-49 years. The four age groups following (50-59 years, 60-69 years, 70-79 years, and 80 years and above) had reported for symptoms of Anxiety Disorder or Depressive Disorder that decreased according to increasing age.
For our model we want to check the assumption of the linear regression model including: * Linear Relationship * Homoscedasticity * Independence * Normality
The raw case data in relation to anxiety value does not appear to be linear. The lm() line calculation does not fit all of the data evenly and has some tailing at the high end of cases and anxiety values. While there does appear to be a correlation, the relationship does not appear to be linear.
Upon an initial visualization to check for a linear relationship, we see that there is nothing to indicate otherwise. The data does not appear to be in any other form than linear. Log-transorming the data allows us to perform a linear regression as we cannot reject the assumption that the relationship is linear after tranforming the data.
In the residual plot, we want to check that the fitted residual line is mostly horizontal. We want our residual plot to have randomly distributed error terms with no clear pattern that are relatively close to this fitted line. In the model that relates log-transformed case counts and anxiety rates in the survey, we see this line is not perfectly horizontal, but nothing concerning to throw out the entire model. Some of the data is not spread evenly and there may be a couple outliers.
In the Normal Q-Q plot, we want to see that most of the data points are on the theoretical dashed line incicating normality. Our data here has a majority of the points near this line and has some tailing on either end of the distribution but nothing major. This would indicate normal distribution.
The Scale-Location plot, similar to the residual plot, wants a fitted line as horizontal as possible. Here we see similar results with no major deviations from a horizontal fitted line and a similar spread which should be looked into further via Cook’s distance to determine outliers.
In the Residuals vs leverage plot with Cook’s distance, we see that there are no points that pass the Cook’s distance line. Since no points are outside the line our model assumptions hold.
All our assumptions appear to hold for the model that related log-transformed case counts and anxiety rates in the country. While some data points come close to the Cook’s distance line of 0.5 and our residual plot is not as great as it could be, there is not strong enough evidence in these plots to reject the model and it’s assumptions.
One way to be more conclusive about the assumptions is to have more data points, the data we collected is only weekly and the case-counts were aggregated weekly to allow for a proper comparison. More data points may be helpful, but weekly case data tends to be similar to daily case data and may not end up having that much of an impact on the conclusions. However, more data points would allow the model validation to be improved. In addition a further analysis could be completed to see how values for Anxiety Disorder and Depression Disorder effect different populations such as looking at differences accross age groups or ethnicities.
Our analysis shows a relationship between the number of COVID-19 cases and indicators of depression and anxiety among people living in the United States. We aggregated the daily case counts from the WHO Coronavirus Dashboard with the frequency of anxiety and depression symptoms reported in the HPS data set, and then used this aggregated data to create a linear model to analyze the relationship between case rates and the rate of anxiety or depression symptoms. We found a relationship between these two variables with an extremely low p-value and a high adjusted \(R^2\) of 0.8475. The model assumptions appear to hold, so the low p-value suggests that we may conclusively reject the null hypothesis of no relationship existing between COVID-19 cases and indicators of anxiety or depression. In addition, the high Adjusted \(R^2\) suggests that COVID-19 cases and symptoms of depression or anxiety are highly correlated.
Caveats of our analysis include not being able to infer causality between COVID-19 cases and symptoms of depression or anxiety and not measuring the effects of potential confounders. Barriers to making causal statements arise from the lack of information produced by the HPS on respondents’ employment status, social support, family and living arrangements, pre-existing mental health symptoms and diagnoses, and community characteristics, all of which may be related to emotional distress (Bui et al., 2020; Ganson et al., 2021). Potential confounders that may be important to address in future analysis are the effects of season or pre-existing mental health conditions. For example, weather fluctuations might influence both COVID-19 transmission and the prevalence of seasonal affective disorder, so future analysis that incorporates seasonal effects such as temperature and hours of daylight might produce more precise results. Additionally, chronic stress associated with pre-existing mental health conditions weakens individuals’ immune systems and might lead to more frequent symptoms of depression or anxiety; therefore, future analysis incorporating effects of pre-existing mental health conditions such as biological or psychological markers of stress might also produce more precise results.
Depression and anxiety have been studied extensively and many factors are known to contribute to their development. Some of these factors, such as staying indoors extensively and loss of employment, are byproducts of the government restrictions and lockdowns due to COVID-19. Other factors such as the deaths of loved ones are unfortunately inevitable consequences of the COVID-19 pandemic. These external factors may further obfuscate the true cause of the increased frequency of depression and anxiety symptoms since the start of the pandemic, so it is important to recognize that we cannot conclude that COVID-19 cases cause increased depression or anxiety rates. However, we can acknowledge that the COVID-19 pandemic has introduced multifaceted challenges to healthcare and policy, and that mental health should not be ignored in efforts to understand and solve them.
Abbott, A. (2021). COVID’s mental-health toll: how scientists are tracking a surge in depression. Nature, 590(7845), 194–195. https://doi.org/10.1038/d41586-021-00175-z
Brodeur, A., Clark, A. E., Fleche, S., & Powdthavee, N. (2021). COVID-19, lockdowns and well-being: Evidence from Google Trends. Journal of Public Economics, 193, 104346. https://doi.org/10.1016/j.jpubeco.2020.104346
Bui, C. N., Peng, C., Mutchler, J. E., & Burr, J. A. (2020). Race and Ethnic Group Disparities in Emotional Distress Among Older Adults During the COVID-19 Pandemic. The Gerontologist, 61(2), 262–272. https://doi.org/10.1093/geront/gnaa217
Clemente-Suárez, V. J., Dalamitros, A. A., Beltran-Velasco, A. I., Mielgo-Ayuso, J., & Tornero-Aguilera, J. F. (2020). Social and Psychophysiological Consequences of the COVID-19 Pandemic: An Extensive Literature Review. Frontiers in Psychology, 11, 1–15. https://doi.org/10.3389/fpsyg.2020.580225
Delmastro, M., & Zamariola, G. (2020). Depressive symptoms in response to COVID-19 and lockdown: a cross-sectional study on the Italian population. Scientific Reports, 10(1), 1–10. https://doi.org/10.1038/s41598-020-79850-6
Fields JF, Hunter-Childs J, Tersine A, Sisson J, Parker E, Velkoff V, Logan C, and Shin H. Design and Operation of the 2020 Household Pulse Survey, 2020. U.S. Census Bureau. Forthcoming. Ganson, K. T., Tsai, A. C., Weiser, S. D., Benabou, S. E., & Nagata, J. M. (2021). Job Insecurity and Symptoms of Anxiety and Depression Among U.S. Young Adults During COVID-19. Journal of Adolescent Health, 68(1), 53–56. https://doi.org/10.1016/j.jadohealth.2020.10.008
Holmes, E. A., O’Connor, R. C., Perry, V. H., Tracey, I., Wessely, S., Arseneault, L., Ballard, C., Christensen, H., Cohen Silver, R., Everall, I., Ford, T., John, A., Kabir, T., King, K., Madan, I., Michie, S., Przybylski, A. K., Shafran, R., Sweeney, A., … Bullmore, E. (2020). Multidisciplinary research priorities for the COVID-19 pandemic: a call for action for mental health science. The Lancet Psychiatry, 7(6), 547–560. https://doi.org/10.1016/s2215-0366(20)30168-1
WHO COVID-19 Dashboard. Geneva: World Health Organization, 2020. Available online: https://covid19.who.int/
# Setup this markdown file:
knitr::opts_chunk$set(
fig.pos = 'H',
warning = FALSE,
message = FALSE,
eval = TRUE,
echo = FALSE,
fig.align='center'
)
# Include all necessary libraries
library(tidyverse)
library(usmap)
library(reshape2)
library(lubridate)
library(rstatix)
library(gt)
# Import WHO COVID Data, CDC Depression and Anxiety Data, and Google Trends
covid <- read.csv(file = "WHO-COVID-19-global-data.csv", fileEncoding="UTF-8-BOM")
indicators <- read.csv(file = "CDC_Anxiety.csv", header=T)
GOOGLE <- read.csv(file = "Google_US_Depression_Anxiety.csv")
# Convert date column to POSIXct
GOOGLE$Week <- as.POSIXct(GOOGLE$Week)
GOOGLE <- subset(GOOGLE, Week >= as.POSIXct("2020-01-01"))
# subset covid data to include only US data
covid_US = subset(covid, Country == "United States of America") %>% select(Date_reported,New_cases,Cumulative_cases,New_deaths,Cumulative_deaths)
covid_US$Date_reported <- as.POSIXct(covid_US$Date_reported, format = "%Y-%m-%d")
# subset CDC anxiety data to include only the measures for anxiety or depressive disorder (combined) and the national measure
indicators_US = subset(indicators, Indicator == "Symptoms of Anxiety Disorder or Depressive Disorder" &
Group == "National Estimate" ) %>% select(Time.Period,Time.Period.Label,Time.Period.Start.Date,Time.Period.End.Date,Value,Low.CI,High.CI,Quartile.Range)
# remove na data from the indicators leaving 24 values for 24 distinct time periods
indicators_US.narm <- indicators_US %>% filter(!is.na(Value))
# Convert start/end date to standard format.
indicators_US.narm$Time.Period.Start.Date <- as.POSIXct(indicators_US.narm$Time.Period.Start.Date, format = "%m/%d/%Y")
indicators_US.narm$Time.Period.End.Date <- as.POSIXct(indicators_US.narm$Time.Period.End.Date, format = "%m/%d/%Y")
# Present Depression or Anxiety Information by Race
indicators.race <- subset(indicators, Indicator == "Symptoms of Anxiety Disorder or Depressive Disorder" &
Group == "By Race/Hispanic ethnicity", c(Subgroup, Value)) %>% filter(!is.na(Value))
indicators.race$Subgroup <- as.character(indicators.race$Subgroup)
indicators.race.means <- aggregate(indicators.race, list(indicators.race$Subgroup), mean)
ggplot(indicators.race.means, aes(x=Value, y=Group.1, )) +
geom_bar(aes(fill=Group.1), stat = "identity", position = "dodge") +
theme(legend.position = "none") +
labs(title = "Depression or Anxiety Symptoms by Race",
subtitle = "Average 2020 Household Pulse Survey",
x = "Average % of Survey Responses Indicating Anxiety or Depression",
y = "")
# Present Depression or Anxiety Information by Sex
indicators.sex <- subset(indicators, Indicator == "Symptoms of Anxiety Disorder or Depressive Disorder" &
Group == "By Gender",c(Subgroup, Value)) %>% filter(!is.na(Value))
indicators.sex.means <- aggregate(indicators.sex, list(indicators.sex$Subgroup), mean)
# ggplot(indicators.sex.means, aes(x=Value, y=Group.1)) +
# geom_bar(aes(fill=Group.1), stat = "identity", position = "dodge") +
# theme(legend.position = "none") +
# labs(title = "Depression or Anxiety Symptoms by Sex",
# subtitle = "Average 2020 Household Pulse Survey",
# x = "Average % of Survey Responses Indicating Anxiety or Depression",
# y = "")
# Present Depression or Anxiety Information by State
indicators.state <- subset(indicators, Indicator == "Symptoms of Anxiety Disorder or Depressive Disorder" &
Group == "By State", c(Subgroup,Value)) %>% filter(!is.na(Value))
indicators.state.means <- aggregate(indicators.state, list(indicators.state$Subgroup), mean) %>% select(Group.1,Value)
indicators.state.means$Group.1 <- usmap::fips(indicators.state.means$Group.1)
names(indicators.state.means) <- c('fips','Value')
usmap::plot_usmap(data=indicators.state.means, values="Value") +
theme(legend.position = 'right') +
labs(title = "Depression or Anxiety Symptoms by State", subtitle = "Average 2020 Household Pulse Survey") +
scale_fill_continuous(low = "yellow", high = "blue", name = "Anxiety or Depression Percentage")
# Fill in dates of indicators and GOOGLE
HPS <- list()
for (i in 1:(dim(indicators_US.narm)[1]-1)) {
fill.dates <- seq(indicators_US.narm[i,'Time.Period.Start.Date'], indicators_US.narm[i+1,'Time.Period.Start.Date'] - 1*60*60*24, by="days")
fill.values <- c(rep(1,length(fill.dates)))*indicators_US.narm[i,'Value']
fill.dates.df <- data.frame(fill.dates,fill.values)
HPS[[i]] <- fill.dates.df
}
HPS.full <- bind_rows(HPS)
Trends.Anxiety <- list()
Trends.Depression <- list()
for (i in 1:(dim(GOOGLE)[1]-1)) {
fill.dates <- seq(GOOGLE[i,'Week'], GOOGLE[i+1,'Week'] - 1*60*60*24, by="days")
fill.anxiety.values <- c(rep(1,length(fill.dates)))*GOOGLE[i,'Anxiety...United.States.']
fill.depression.values <- c(rep(1,length(fill.dates)))*GOOGLE[i,'Depression...United.States.']
fill.dates.df <- data.frame(fill.dates,fill.anxiety.values)
Trends.Anxiety[[i]] <- fill.dates.df
fill.dates.df <- data.frame(fill.dates,fill.depression.values)
Trends.Depression[[i]] <- fill.dates.df
}
Trends.Anxiety.Full <- bind_rows(Trends.Anxiety)
Trends.Depression.Full <- bind_rows(Trends.Depression)
# Combine into one dataframe
all.covid <- covid_US %>% subset(Date_reported >= as.POSIXct("2020-04-23") & Date_reported < as.POSIXct("2021-01-01"),c(Date_reported,New_cases,New_deaths))
all.Anxiety <- Trends.Anxiety.Full %>% subset(fill.dates >= as.POSIXct("2020-04-23") & fill.dates < as.POSIXct("2021-01-01"))
all.Depression <- Trends.Depression.Full %>% subset(fill.dates >= as.POSIXct("2020-04-23") & fill.dates < as.POSIXct("2021-01-01"))
all.HPS <- HPS.full %>% subset(fill.dates >= as.POSIXct("2020-04-23") & fill.dates < as.POSIXct("2021-01-01"))
all.df <- data.frame(all.covid,all.Anxiety$fill.anxiety.values,all.Depression$fill.depression.values,all.HPS$fill.values)
names(all.df) <- c("Date","New Cases","New Deaths","Anxiety Google Trends","Depression Google Trends","Household Pulse Survey")
# Show daily deaths, cases, anxiety|depression, and google trends overtime
all.df.m <- melt(all.df %>% select(1:3), "Date")
# New_cases,New_deaths,all.Anxiety.fill.anxiety.values,all.Depression.fill.depression.values,all.HPS.fill.values
#options(repr.plot.width = 8, repr.plot.height = 0.75)
ggplot(all.df.m, aes(Date,value,colour = variable)) +
geom_point() +
facet_wrap(~ variable, ncol=1,scales = "free_y") +
labs(title = "COVID-19 Deaths and Cases",
subtitle = "WHO Data in 2020",
y = "") +
theme(text = element_text(size = 12), element_line(size = 0.2))
percentages <- data.frame(all.covid$Date,all.Anxiety$fill.anxiety.values,all.Depression$fill.depression.values,all.HPS$fill.values)
names(percentages) <-c("Date","Anxiety Google Trends","Depression Google Trends","Household Pulse Survey")
percent.m <- melt(percentages,"Date")
ggplot(percent.m, aes(Date,value,colour = variable)) +
geom_point() +
labs(title = "HPS Results and Google Trends",
subtitle = "HPS and Google Trends on 'Anxiety' and 'Depression' Topics in 2020",
y = "% of responders indicating symptoms and Google's weighted %") +
theme(text = element_text(size = 12), element_line(size = 0.2))
covid.US.Months <- data.frame(covid_US) %>% subset(Date_reported >= as.POSIXct("2020-01-01") & Date_reported < as.POSIXct("2021-01-01"),c(Date_reported,New_cases,New_deaths))
covid.US.Months$Date_reported <- format(covid.US.Months$Date_reported, "01-%m-%Y")
covid.US.Months.stats.cases <- covid.US.Months %>%
select(Date_reported,New_cases) %>%
group_by(Date_reported) %>%
get_summary_stats(type='common')
covid.US.Months.stats.deaths <- covid.US.Months %>%
select(Date_reported,New_deaths) %>%
group_by(Date_reported) %>%
get_summary_stats(type='common')
# covid.US.Months.stats.cases %>%
# select('Date_reported','min','max','median','mean','sd') %>%
# gt() %>%
# tab_header(
# title = "2020 COVID-19 Cases",
# subtitle = "Values per day of each month"
# ) %>%
# fmt_date(
# columns = vars(Date_reported),
# date_style = 11
# ) %>%
# fmt_number(
# columns = vars(min,max,median,mean,sd),
# decimals = 0
# )
#
# covid.US.Months.stats.deaths %>%
# select('Date_reported','min','max','median','mean','sd') %>%
# gt() %>%
# tab_header(
# title = "2020 COVID-19 Deaths",
# subtitle = "Values per day of each month"
# ) %>%
# fmt_date(
# columns = vars(Date_reported),
# date_style = 11
# ) %>%
# fmt_number(
# columns = vars(min,max,median,mean,sd),
# decimals = 0
# )
#plot looking at the cumulative deaths over the dates reported
qplot(Date_reported, Cumulative_deaths, data = covid_US)
#plot looking at the cases of covid over the dates reported
qplot(Date_reported, New_cases, data = covid_US)
#plot looking at the CDC anxiety data over the time reported
qplot(Time.Period, Value, data = indicators_US.narm)
dates1 = covid_US[112:124,]
dates2 = covid_US[126:131,]
dates3 = covid_US[133:138,]
dates4 = covid_US[140:145,]
dates5 = covid_US[147:152,]
dates6 = covid_US[154:159,]
dates7 = covid_US[161:166,]
dates8 = covid_US[168:173,]
dates9 = covid_US[175:180,]
dates10 = covid_US[182:187,]
dates11 = covid_US[189:194,]
dates12 = covid_US[196:201,]
dates13 = covid_US[230:242,]
dates14 = covid_US[244:256,]
dates15 = covid_US[258:270,]
dates16 = covid_US[272:284,]
dates17 = covid_US[286:298,]
dates18 = covid_US[300:312,]
dates19 = covid_US[314:326,]
dates20 = covid_US[328:340,]
dates21 = covid_US[342:354,]
dates22 = covid_US[370:382,]
dates23 = covid_US[384:396,]
dates24 = covid_US[398:410,]
dates1_cases = mean(dates1$New_cases)
dates1_deaths = mean(dates1$New_deaths)
dates2_cases = mean(dates2$New_cases)
dates2_deaths = mean(dates2$New_deaths)
dates3_cases = mean(dates3$New_cases)
dates3_deaths = mean(dates3$New_deaths)
dates4_cases = mean(dates4$New_cases)
dates4_deaths = mean(dates4$New_deaths)
dates5_cases = mean(dates5$New_cases)
dates5_deaths = mean(dates5$New_deaths)
dates6_cases = mean(dates6$New_cases)
dates6_deaths = mean(dates6$New_deaths)
dates7_cases = mean(dates7$New_cases)
dates7_deaths = mean(dates7$New_deaths)
dates8_cases = mean(dates8$New_cases)
dates8_deaths = mean(dates8$New_deaths)
dates9_cases = mean(dates9$New_cases)
dates9_deaths = mean(dates9$New_deaths)
dates10_cases = mean(dates10$New_cases)
dates10_deaths = mean(dates10$New_deaths)
dates11_cases = mean(dates11$New_cases)
dates11_deaths = mean(dates11$New_deaths)
dates12_cases = mean(dates12$New_cases)
dates12_deaths = mean(dates12$New_deaths)
dates13_cases = mean(dates13$New_cases)
dates13_deaths = mean(dates13$New_deaths)
dates14_cases = mean(dates14$New_cases)
dates14_deaths = mean(dates14$New_deaths)
dates15_cases = mean(dates15$New_cases)
dates15_deaths = mean(dates15$New_deaths)
dates16_cases = mean(dates16$New_cases)
dates16_deaths = mean(dates16$New_deaths)
dates17_cases = mean(dates17$New_cases)
dates17_deaths = mean(dates17$New_deaths)
dates18_cases = mean(dates18$New_cases)
dates18_deaths = mean(dates18$New_deaths)
dates19_cases = mean(dates19$New_cases)
dates19_deaths = mean(dates19$New_deaths)
dates20_cases = mean(dates20$New_cases)
dates20_deaths = mean(dates20$New_deaths)
dates21_cases = mean(dates21$New_cases)
dates21_deaths = mean(dates21$New_deaths)
dates22_cases = mean(dates22$New_cases)
dates22_deaths = mean(dates22$New_deaths)
dates23_cases = mean(dates23$New_cases)
dates23_deaths = mean(dates23$New_deaths)
dates24_cases = mean(dates24$New_cases)
dates24_deaths = mean(dates24$New_deaths)
dates = c("dates1", "dates2", "dates3", "dates4", "dates5", "dates 6", "dates7", "dates8", "dates9", "dates10", "dates11", "dates12", "dates13", "dates14", "dates15", "dates16", "dates17", "dates18", "dates19", "dates20", "dates21", "dates22", "dates23", "dates24")
cases = c(dates1_cases, dates2_cases, dates3_cases, dates4_cases, dates5_cases, dates6_cases, dates7_cases, dates8_cases, dates9_cases, dates10_cases, dates11_cases, dates12_cases, dates13_cases, dates14_cases, dates15_cases, dates16_cases, dates17_cases, dates18_cases, dates19_cases, dates20_cases, dates21_cases, dates22_cases, dates23_cases, dates24_cases)
deaths = c(dates1_deaths, dates2_deaths, dates3_deaths, dates4_deaths, dates5_deaths, dates6_deaths, dates7_deaths, dates8_deaths, dates9_deaths, dates10_deaths, dates11_deaths, dates12_deaths, dates13_deaths, dates14_deaths, dates15_deaths, dates16_deaths, dates17_deaths, dates18_deaths, dates19_deaths, dates20_deaths, dates21_deaths, dates22_deaths, dates23_deaths, dates24_deaths)
anxiety_value = indicators_US.narm$Value
#The compiled data has the date, the average number of cases for that date period, the average number of deaths for that date #period, and the reported CDC anxiety value for that data period.
compiled_data = data.frame(dates, cases, deaths, anxiety_value)
#Plot showing new cases vs anxiety value
plot(x=cases, y=anxiety_value, main="cases ~ anxiety_value")
#Plot showing new deaths vs anxiety value
plot(x=deaths, y=anxiety_value, main="deaths ~ anxiety_value")
linearMod_cases <- lm(log(cases) ~ anxiety_value, data=compiled_data)
summary(linearMod_cases)
#Basic linear model relating new deaths vs anxiety value, the reported summary shows that the slope (depression_value) is not #significant and positive for this relationship.
linearMod_deaths <- lm(log(deaths) ~ anxiety_value, data=compiled_data)
summary(linearMod_deaths)
covid_US
#subsetting the CDC anxiety data to include only the measures for anxiety or depressive disorder (combined) grouped by race
indicators
indicators_race = subset(indicators, Indicator == "Symptoms of Anxiety Disorder or Depressive Disorder" &
Group == "By Race/Hispanic ethnicity" )
indicators_race
indicators_race.narm <- indicators_race %>% filter(!is.na(Value))
indicators_race.narm
summary(indicators_race.narm)
anova_race = aov(Value ~ Subgroup, data = indicators_race.narm)
summary(anova_race)
library(emmeans)
emmeans(anova_race, pairwise ~ "Subgroup")
covid_US
#subsetting the CDC anxiety data to include only the measures for anxiety or depressive disorder (combined) grouped by age
indicators
indicators_age = subset(indicators, Indicator == "Symptoms of Anxiety Disorder or Depressive Disorder" &
Group == "By Age" )
indicators_age
indicators_age.narm <- indicators_age %>% filter(!is.na(Value))
indicators_age.narm
summary(indicators_age.narm)
anova_age = aov(Value ~ Subgroup, data = indicators_age.narm)
summary(anova_age)
emmeans(anova_age, pairwise ~ "Subgroup")
ggplot(data = compiled_data, aes(x = anxiety_value, y=cases)) +
geom_point(color = 'blue') +
geom_smooth(method = "lm", se = FALSE, color = 'red')
ggplot(data = compiled_data, aes(x = anxiety_value, y=log(cases))) +
geom_point(color = 'blue') +
geom_smooth(method = "lm", se = FALSE, color = 'red')
plot(linearMod_cases, which = 1)
plot(linearMod_cases, which = 2)
plot(linearMod_cases, which = 3)
plot(linearMod_cases, which = 5)